home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Windows Expert
/
Windows Expert.iso
/
windownt
/
awksrc.zip
/
GAWK-D~1.14
/
GAWK~8.INF
(
.txt
)
< prev
next >
Wrap
GNU Info File
|
1993-10-03
|
50KB
|
945 lines
This is Info file gawk.info, produced by Makeinfo-1.47 from the input
file gawk.texi.
This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them.
This is Edition 0.14 of `The GAWK Manual',
for the 2.14 version of the GNU implementation
of AWK.
Copyright (C) 1989, 1991, 1992 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
File: gawk.info, Node: Extracting, Next: Distribution contents, Prev: Gawk Distribution, Up: Gawk Distribution
Getting the `gawk' Distribution
-------------------------------
`gawk' is distributed as a compressed `tar' file. You can get it
via anonymous `ftp' to the Internet host `prep.ai.mit.edu'. Like all
GNU software, it will be archived at other well known systems, from
which it will be possible to use some sort of anonymous `uucp' to
obtain the distribution as well.
Once you have the distribution (for example, `gawk-2.14.0.tar.Z'),
first use `uncompress' to expand the file, and then use `tar' to
extract it. `uncompress' usually has a link named `zcat', which causes
it to decompress the file to the standard output. You can use the
following pipeline to produce the `gawk' distribution:
# Under System V, add 'o' to the tar flags
zcat gawk-2.14.0.tar.Z | tar -xvpf -
This will create a directory named `gawk-2.14' in the current directory.
The distribution file name is of the form `gawk-2.14.N.tar.Z'. The N
represents a "patchlevel", meaning that minor bugs have been fixed in
the major release. The current patchlevel is 0, but when retrieving
distributions, you should get the version with the highest patchlevel.
If you are not on a Unix system, you will need to make other
arrangements for getting and extracting the `gawk' distribution. You
should consult a local expert.
File: gawk.info, Node: Distribution contents, Prev: Extracting, Up: Gawk Distribution
Contents of the `gawk' Distribution
-----------------------------------
`gawk' has a number of C source files, documentation files,
subdirectories and files related to the configuration process (*note
Compiling and Installing `gawk' on Unix: Unix Installation.), and
several subdirectories related to different, non-Unix, operating
systems.
various `.c', `.y', and `.h' files
The C and YACC source files are the actual `gawk' source code.
`README'
`README.VMS'
`README.dos'
`README.rs6000'
`README.ultrix'
Descriptive files: `README' for `gawk' under Unix, and the rest
for the various hardware and software combinations.
`PORTS'
A list of systems to which `gawk' has been ported, and which have
successfully run the test suite.
`ACKNOWLEDGMENT'
A list of the people who contributed major parts of the code or
documentation.
`NEWS'
A list of changes to `gawk' since the last release or patch.
`COPYING'
The GNU General Public License.
`FUTURES'
A brief list of features and/or changes being contemplated for
future releases, with some indication of the time frame for the
feature, based on its difficulty.
`LIMITATIONS'
A list of those factors that limit `gawk''s performance. Most of
these depend on the hardware or operating system software, and are
not limits in `gawk' itself.
`PROBLEMS'
A file describing known problems with the current release.
`gawk.1'
The `troff' source for a manual page describing `gawk'.
`gawk.texinfo'
The `texinfo' source file for this Info file. It should be
processed with TeX to produce a printed manual, and with
`makeinfo' to produce the Info file.
`Makefile.in'
`config'
`config.in'
`configure'
`missing'
`mungeconf'
These files and subdirectories are used when configuring `gawk'
for various Unix systems. They are explained in detail in *Note
Compiling and Installing `gawk' on Unix: Unix Installation.
`atari'
Files needed for building `gawk' on an Atari ST. *Note Installing
`gawk' on the Atari ST: Atari Installation, for details.
Files needed for building `gawk' under MS-DOS. *Note Installing
`gawk' on MS-DOS: MS-DOS Installation, for details.
`vms'
Files needed for building `gawk' under VMS. *Note Compiling
Installing and Running `gawk' on VMS: VMS Installation, for
details.
`test'
Many interesting `awk' programs, provided as a test suite for
`gawk'. You can use `make test' from the top level `gawk'
directory to run your version of `gawk' against the test suite.
There are many programs here that are useful in their own right.
If `gawk' successfully passes `make bigtest' then you can be
confident of a successful port.
File: gawk.info, Node: Unix Installation, Next: VMS Installation, Prev: Gawk Distribution, Up: Installation
Compiling and Installing `gawk' on Unix
=======================================
Often, you can compile and install `gawk' by typing only two
commands. However, if you do not use a supported system, you may need
to configure `gawk' for your system yourself.
* Menu:
* Quick Installation:: Compiling `gawk' on a
supported Unix version.
* Configuration Philosophy:: How it's all supposed to work.
* New Configurations:: What to do if there is no supplied
configuration for your system.
File: gawk.info, Node: Quick Installation, Next: Configuration Philosophy, Prev: Unix Installation, Up: Unix Installation
Compiling `gawk' for a Supported Unix Version
---------------------------------------------
After you have extracted the `gawk' distribution, `cd' to
`gawk-2.14'. Look in the `config' subdirectory for a file that matches
your hardware/software combination. In general, only the software is
relevant; for example `sunos41' is used for SunOS 4.1, on both Sun 3
and Sun 4 hardware.
If you find such a file, run the command:
# assume you have SunOS 4.1
./configure sunos41
This produces a `Makefile' and `config.h' tailored to your system.
You may wish to edit the `Makefile' to use a different C compiler, such
as `gcc', the GNU C compiler, if you have it. You may also wish to
change the `CFLAGS' variable, which controls the command line options
that are passed to the C compiler (such as optimization levels, or
compiling for debugging).
After you have configured `Makefile' and `config.h', type:
make
and shortly thereafter, you should have an executable version of `gawk'.
That's all there is to it!
File: gawk.info, Node: Configuration Philosophy, Next: New Configurations, Prev: Quick Installation, Up: Unix Installation
The Configuration Process
-------------------------
(This section is of interest only if you know something about using
the C language and the Unix operating system.)
The source code for `gawk' generally attempts to adhere to industry
standards wherever possible. This means that `gawk' uses library
routines that are specified by the ANSI C standard and by the POSIX
operating system interface standard. When using an ANSI C compiler,
function prototypes are provided to help improve the compile-time
checking.
Many older Unix systems do not support all of either the ANSI or the
POSIX standards. The `missing' subdirectory in the `gawk' distribution
contains replacement versions of those subroutines that are most likely
to be missing.
The `config.h' file that is created by the `configure' program
contains definitions that describe features of the particular operating
system where you are attempting to compile `gawk'. For the most part,
it lists which standard subroutines are *not* available. For example,
if your system lacks the `getopt' routine, then `GETOPT_MISSING' would
be defined.
`config.h' also defines constants that describe facts about your
variant of Unix. For example, there may not be an `st_blksize' element
in the `stat' structure. In this case `BLKSIZE_MISSING' would be
defined.
Based on the list in `config.h' of standard subroutines that are
missing, `missing.c' will do a `#include' of the appropriate file(s)
from the `missing' subdirectory.
Conditionally compiled code in the other source files relies on the
other definitions in the `config.h' file.
Besides creating `config.h', `configure' produces a `Makefile' from
`Makefile.in'. There are a number of lines in `Makefile.in' that are
system or feature specific. For example, there is line that begins
with `##MAKE_ALLOCA_C##'. This is normally a comment line, since it
starts with `#'. If a configuration file has `MAKE_ALLOCA_C' in it,
then `configure' will delete the `##MAKE_ALLOCA_C##' from the beginning
of the line. This will enable the rules in the `Makefile' that use a C
version of `alloca'. There are several similar features that work in
this fashion.
File: gawk.info, Node: New Configurations, Prev: Configuration Philosophy, Up: Unix Installation
Configuring `gawk' for a New System
-----------------------------------
(This section is of interest only if you know something about using
the C language and the Unix operating system, and if you have to install
`gawk' on a system that is not supported by the `gawk' distribution. If
you are a C or Unix novice, get help from a local expert.)
If you need to configure `gawk' for a Unix system that is not
supported in the distribution, first see *Note The Configuration
Process: Configuration Philosophy. Then, copy `config.in' to
`config.h', and copy `Makefile.in' to `Makefile'.
Next, edit both files. Both files are liberally commented, and the
necessary changes should be straightforward.
While editing `config.h', you need to determine what library
routines you do or do not have by consulting your system documentation,
or by perusing your actual libraries using the `ar' or `nm' utilities.
In the worst case, simply do not define *any* of the macros for missing
subroutines. When you compile `gawk', the final link-editing step will
fail. The link editor will provide you with a list of unresolved
external references--these are the missing subroutines. Edit
`config.h' again and recompile, and you should be set.
Editing the `Makefile' should also be straightforward. Enable or
disable the lines that begin with `##MAKE_WHATEVER##', as appropriate.
Select the correct C compiler and `CFLAGS' for it. Then run `make'.
Getting a correct configuration is likely to be an iterative process.
Do not be discouraged if it takes you several tries. If you have no
luck whatsoever, please report your system type, and the steps you took.
Once you do have a working configuration, please send it to the
maintainers so that support for your system can be added to the
official release.
*Note Reporting Problems and Bugs: Bugs, for information on how to
report problems in configuring `gawk'. You may also use the same
mechanisms for sending in new configurations.
File: gawk.info, Node: VMS Installation, Next: MS-DOS Installation, Prev: Unix Installation, Up: Installation
Compiling, Installing, and Running `gawk' on VMS
================================================
This section describes how to compile and install `gawk' under VMS.
* Menu:
* VMS Compilation:: How to compile `gawk' under VMS.
* VMS Installation Details:: How to install `gawk' under VMS.
* VMS Running:: How to run `gawk' under VMS.
* VMS POSIX:: Alternate instructions for VMS POSIX.
File: gawk.info, Node: VMS Compilation, Next: VMS Installation Details, Prev: VMS Installation, Up: VMS Installation
Compiling `gawk' under VMS
--------------------------
To compile `gawk' under VMS, there is a `DCL' command procedure that
will issue all the necessary `CC' and `LINK' commands, and there is
also a `Makefile' for use with the `MMS' utility. From the source
directory, use either
$ @[.VMS]VMSBUILD.COM
$ MMS/DESCRIPTION=[.VMS]DECSRIP.MMS GAWK
Depending upon which C compiler you are using, follow one of the sets
of instructions in this table:
VAX C V3.x
Use either `vmsbuild.com' or `descrip.mms' as is. These use
`CC/OPTIMIZE=NOLINE', which is essential for Version 3.0.
VAX C V2.x
You must have Version 2.3 or 2.4; older ones won't work. Edit
either `vmsbuild.com' or `descrip.mms' according to the comments
in them. For `vmsbuild.com', this just entails removing two `!'
delimiters. Also edit `config.h' (which is a copy of file
`[.config]vms-conf.h') and comment out or delete the two lines
`#define __STDC__ 0' and `#define VAXC_BUILTINS' near the end.
GNU C
Edit `vmsbuild.com' or `descrip.mms'; the changes are different
from those for VAX C V2.x, but equally straightforward. No
changes to `config.h' should be needed.
DEC C
Edit `vmsbuild.com' or `descrip.mms' according to their comments.
No changes to `config.h' should be needed.
`gawk' 2.14 has been tested under VAX/VMS 5.5-1 using VAX C V3.2,
GNU C 1.40 and 2.3. It should work without modifications for VMS V4.6
and up.
File: gawk.info, Node: VMS Installation Details, Next: VMS Running, Prev: VMS Compilation, Up: VMS Installation
Installing `gawk' on VMS
------------------------
To install `gawk', all you need is a "foreign" command, which is a
`DCL' symbol whose value begins with a dollar sign.
$ GAWK :== $device:[directory]GAWK
(Substitute the actual location of `gawk.exe' for
`device:[directory]'.) The symbol should be placed in the `login.com'
of any user who wishes to run `gawk', so that it will be defined every
time the user logs on. Alternatively, the symbol may be placed in the
system-wide `sylogin.com' procedure, which will allow all users to run
`gawk'.
Optionally, the help entry can be loaded into a VMS help library:
$ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP
(You may want to substitute a site-specific help library rather than
the standard VMS library `HELPLIB'.) After loading the help text,
$ HELP GAWK
will provide information about both the `gawk' implementation and the
`awk' programming language.
The logical name `AWK_LIBRARY' can designate a default location for
`awk' program files. For the `-f' option, if the specified filename
has no device or directory path information in it, `gawk' will look in
the current directory first, then in the directory specified by the
translation of `AWK_LIBRARY' if the file was not found. If after
searching in both directories, the file still is not found, then `gawk'
appends the suffix `.awk' to the filename and the file search will be
re-tried. If `AWK_LIBRARY' is not defined, that portion of the file
search will fail benignly.
File: gawk.info, Node: VMS Running, Next: VMS POSIX, Prev: VMS Installation Details, Up: VMS Installation
Running `gawk' on VMS
---------------------
Command line parsing and quoting conventions are significantly
different on VMS, so examples in this manual or from other sources
often need minor changes. They *are* minor though, and all `awk'
programs should run correctly.
Here are a couple of trivial tests:
$ gawk -- "BEGIN {print ""Hello, World!""}"
$ gawk -"W" version ! could also be -"W version" or "-W version"
Note that upper-case and mixed-case text must be quoted.
The VMS port of `gawk' includes a `DCL'-style interface in addition
to the original shell-style interface (see the help entry for details).
One side-effect of dual command line parsing is that if there is only a
single parameter (as in the quoted string program above), the command
becomes ambiguous. To work around this, the normally optional `--'
flag is required to force Unix style rather than `DCL' parsing. If any
other dash-type options (or multiple parameters such as data files to be
processed) are present, there is no ambiguity and `--' can be omitted.
The default search path when looking for `awk' program files
specified by the `-f' option is `"SYS$DISK:[],AWK_LIBRARY:"'. The
logical name `AWKPATH' can be used to override this default. The format
of `AWKPATH' is a comma-separated list of directory specifications.
When defining it, the value should be quoted so that it retains a single
translation, and not a multi-translation `RMS' searchlist.
File: gawk.info, Node: VMS POSIX, Prev: VMS Running, Up: VMS Installation
Building and using `gawk' under VMS POSIX
-----------------------------------------
Ignore the instructions above, although `vms/gawk.hlp' should still
be made available in a help library. Make sure that the two scripts,
`configure' and `mungeconf', are executable; use `chmod +x' on them if
necessary. Then execute the following commands:
$ POSIX
psx> configure vms-posix
psx> make awktab.c gawk
The first command will construct files `config.h' and `Makefile' out of
templates. The second command will compile and link `gawk'. Due to a
`make' bug in VMS POSIX V1.0 and V1.1, the file `awktab.c' must be
given as an explicit target or it will not be built and the final link
step will fail. Ignore the warning `"Could not find lib m in lib
list"'; it is harmless, caused by the explicit use of `-lm' as a linker
option which is not needed under VMS POSIX. Under V1.1 (but not V1.0)
a problem with the `yacc' skeleton `/etc/yyparse.c' will cause a
compiler warning for `awktab.c', followed by a linker warning about
compilation warnings in the resulting object module. These warnings
can be ignored.
Once built, `gawk' will work like any other shell utility. Unlike
the normal VMS port of `gawk', no special command line manipulation is
needed in the VMS POSIX environment.
File: gawk.info, Node: MS-DOS Installation, Next: Atari Installation, Prev: VMS Installation, Up: Installation
Installing `gawk' on MS-DOS
===========================
The first step is to get all the files in the `gawk' distribution
onto your PC. Move all the files from the `pc' directory into the main
directory where the other files are. Edit the file `make.bat' so that
it will be an acceptable MS-DOS batch file. This means making sure that
all lines are terminated with the ASCII carriage return and line feed
characters. restrictions.
`gawk' has only been compiled with version 5.1 of the Microsoft C
compiler. The file `make.bat' from the `pc' directory assumes that you
have this compiler.
Copy the file `setargv.obj' from the library directory where it
resides to the `gawk' source code directory.
Run `make.bat'. This will compile `gawk' for you, and link it.
That's all there is to it!
File: gawk.info, Node: Atari Installation, Prev: MS-DOS Installation, Up: Installation
Installing `gawk' on the Atari ST
=================================
This section assumes that you are running TOS. It applies to other
Atari models (STe, TT) as well.
In order to use `gawk', you need to have a shell, either text or
graphics, that does not map all the characters of a command line to
upper case. Maintaining case distinction in option flags is very
important (*note Invoking `awk': Command Line.). Popular shells like
`gulam' or `gemini' will work, as will newer versions of `desktop'.
Support for I/O redirection is necessary to make it easy to import
`awk' programs from other environments. Pipes are nice to have, but
not vital.
If you have received an executable version of `gawk', place it, as
usual, anywhere in your `PATH' where your shell will find it.
While executing, `gawk' creates a number of temporary files. `gawk'
looks for either of the environment variables `TEMP' or `TMPDIR', in
that order. If either one is found, its value is assumed to be a
directory for temporary files. This directory must exist, and if you
can spare the memory, it is a good idea to put it on a RAM drive. If
neither `TEMP' nor `TMPDIR' are found, then `gawk' uses the current
directory for its temporary files.
The ST version of `gawk' searches for its program files as described
in *Note The `AWKPATH' Environment Variable: AWKPATH Variable. On the
ST, the default value for the `AWKPATH' variable is
`".,c:\lib\awk,c:\gnu\lib\awk"'. The search path can be modified by
explicitly setting `AWKPATH' to whatever you wish. Note that colons
cannot be used on the ST to separate elements in the `AWKPATH'
variable, since they have another, reserved, meaning. Instead, you
must use a comma to separate elements in the path. If you are
recompiling `gawk' on the ST, then you can choose a new default search
path, by setting the value of `DEFPATH' in the file `...\config\atari'.
You may choose a different separator character by setting the value of
`ENVSEP' in the same file. The new values will be used when creating
the header file `config.h'.
Although `awk' allows great flexibility in doing I/O redirections
from within a program, this facility should be used with care on the ST.
In some circumstances the OS routines for file handle pool processing
lose track of certain events, causing the computer to crash, and
requiring a reboot. Often a warm reboot is sufficient. Fortunately,
this happens infrequently, and in rather esoteric situations. In
particular, avoid having one part of an `awk' program using `print'
statements explicitly redirected to `"/dev/stdout"', while other
`print' statements use the default standard output, and a calling shell
has redirected standard output to a file.
When `gawk' is compiled with the ST version of `gcc' and its usual
libraries, it will accept both `/' and `\' as path separators. While
this is convenient, it should be remembered that this removes one,
technically legal, character (`/') from your file names, and that it
may create problems for external programs, called via the `system()'
function, which may not support this convention. Whenever it is
possible that a file created by `gawk' will be used by some other
program, use only backslashes. Also remember that in `awk',
backslashes in strings have to be doubled in order to get literal
backslashes.
The initial port of `gawk' to the ST was done with `gcc'. If you
wish to recompile `gawk' from scratch, you will need to use a compiler
that accepts ANSI standard C (such as `gcc', Turbo C, or Prospero C).
If `sizeof(int) != sizeof(int *)', the correctness of the generated
code depends heavily on the fact that all function calls have function
prototypes in the current scope. If your compiler does not accept
function prototypes, you will probably have to add a number of casts to
the code.
If you are using `gcc', make sure that you have up-to-date libraries.
Older versions have problems with some library functions (`atan2()',
`strftime()', the `%g' conversion in `sprintf()') which may affect the
operation of `gawk'.
In the `atari' subdirectory of the `gawk' distribution is a version
of the `system()' function that has been tested with `gulam' and `msh';
it should work with other shells as well. With `gulam', it passes the
string to be executed without spawning an extra copy of a shell. It is
possible to replace this version of `system()' with a similar function
from a library or from some other source if that version would be a
better choice for the shell you prefer.
The files needed to recompile `gawk' on the ST can be found in the
`atari' directory. The provided files and instructions below assume
that you have the GNU C compiler (`gcc'), the `gulam' shell, and an ST
version of `sed'. The `Makefile' is set up to use `byacc' as a `yacc'
replacement. With a different set of tools some adjustments and/or
editing will be needed.
`cd' to the `atari' directory. Copy `Makefile.st' to `makefile' in
the source (parent) directory. Possibly adjust `../config/atari' to
suit your system. Execute the script `mkconf.g' which will create the
header file `../config.h'. Go back to the source directory. If you
are not using `gcc', check the file `missing.c'. It may be necessary to
change forward slashes in the references to files from the `atari'
subdirectory into backslashes. Type `make' and enjoy.
Compilation with `gcc' of some of the bigger modules, like
`awk_tab.c', may require a full four megabytes of memory. On smaller
machines you would need to cut down on optimizations, or you would have
to switch to another, less memory hungry, compiler.
File: gawk.info, Node: Gawk Summary, Next: Sample Program, Prev: Installation, Up: Top
`gawk' Summary
**************
This appendix provides a brief summary of the `gawk' command line
and the `awk' language. It is designed to serve as "quick reference."
It is therefore terse, but complete.
* Menu:
* Command Line Summary:: Recapitulation of the command line.
* Language Summary:: A terse review of the language.
* Variables/Fields:: Variables, fields, and arrays.
* Rules Summary:: Patterns and Actions, and their
component parts.
* Functions Summary:: Defining and calling functions.
* Historical Features:: Some undocumented but supported "features".
File: gawk.info, Node: Command Line Summary, Next: Language Summary, Prev: Gawk Summary, Up: Gawk Summary
Command Line Options Summary
============================
The command line consists of options to `gawk' itself, the `awk'
program text (if not supplied via the `-f' option), and values to be
made available in the `ARGC' and `ARGV' predefined `awk' variables:
awk [`-FFS'] [`-W' GAWK-OPTS] [`-v VAR=VAL'] [`--'] 'PROGRAM' FILE ...
awk [`-FFS'] [`-W' GAWK-OPTS] [`-v VAR=VAL'] `-f' SOURCE-FILE [`-f SOURCE-FILE ...'] FILE ...
The options that `gawk' accepts are:
`-FFS'
Use FS for the input field separator (the value of the `FS'
predefined variable).
`-f PROGRAM-FILE'
Read the `awk' program source from the file PROGRAM-FILE, instead
of from the first command line argument.
`-v VAR=VAL'
Assign the variable VAR the value VAL before program execution
begins.
`-W compat'
Specifies compatibility mode, in which `gawk' extensions are turned
off.
`-W posix'
Specifies POSIX compatibility mode, in which `gawk' extensions are
turned off and additional restrictions apply.
`-W version'
Print version information for this particular copy of `gawk' on
the error output. This option may disappear in a future version
of `gawk'.
`-W copyleft'
`-W copyright'
Print the short version of the General Public License on the error
output. This option may disappear in a future version of `gawk'.
`-W lint'
Give warnings about dubious or non-portable `awk' constructs.
Signal the end of options. This is useful to allow further
arguments to the `awk' program itself to start with a `-'. This
is mainly for consistency with the argument parsing conventions of
POSIX.
Any other options are flagged as invalid, but are otherwise ignored.
*Note Invoking `awk': Command Line, for more details.
File: gawk.info, Node: Language Summary, Next: Variables/Fields, Prev: Command Line Summary, Up: Gawk Summary
Language Summary
================
An `awk' program consists of a sequence of pattern-action statements
and optional function definitions.
PATTERN { ACTION STATEMENTS }
function NAME(PARAMETER LIST) { ACTION STATEMENTS }
`gawk' first reads the program source from the PROGRAM-FILE(s) if
specified, or from the first non-option argument on the command line.
The `-f' option may be used multiple times on the command line. `gawk'
reads the program text from all the PROGRAM-FILE files, effectively
concatenating them in the order they are specified. This is useful for
building libraries of `awk' functions, without having to include them
in each new `awk' program that uses them. To use a library function in
a file from a program typed in on the command line, specify `-f
/dev/tty'; then type your program, and end it with a `Control-d'. *Note
Invoking `awk': Command Line.
The environment variable `AWKPATH' specifies a search path to use
when finding source files named with the `-f' option. The default
path, which is `.:/usr/lib/awk:/usr/local/lib/awk' is used if `AWKPATH'
is not set. If a file name given to the `-f' option contains a `/'
character, no path search is performed. *Note The `AWKPATH' Environment
Variable: AWKPATH Variable, for a full description of the `AWKPATH'
environment variable.
`gawk' compiles the program into an internal form, and then proceeds
to read each file named in the `ARGV' array. If there are no files
named on the command line, `gawk' reads the standard input.
If a "file" named on the command line has the form `VAR=VAL', it is
treated as a variable assignment: the variable VAR is assigned the
value VAL. If any of the files have a value that is the null string,
that element in the list is skipped.
For each line in the input, `gawk' tests to see if it matches any
PATTERN in the `awk' program. For each pattern that the line matches,
the associated ACTION is executed.
File: gawk.info, Node: Variables/Fields, Next: Rules Summary, Prev: Language Summary, Up: Gawk Summary
Variables and Fields
====================
`awk' variables are dynamic; they come into existence when they are
first used. Their values are either floating-point numbers or strings.
`awk' also has one-dimension arrays; multiple-dimensional arrays may be
simulated. There are several predefined variables that `awk' sets as a
program runs; these are summarized below.
* Menu:
* Fields Summary:: Input field splitting.
* Built-in Summary:: `awk''s built-in variables.
* Arrays Summary:: Using arrays.
* Data Type Summary:: Values in `awk' are numbers or strings.
File: gawk.info, Node: Fields Summary, Next: Built-in Summary, Prev: Variables/Fields, Up: Variables/Fields
Fields
------
As each input line is read, `gawk' splits the line into FIELDS,
using the value of the `FS' variable as the field separator. If `FS'
is a single character, fields are separated by that character.
Otherwise, `FS' is expected to be a full regular expression. In the
special case that `FS' is a single blank, fields are separated by runs
of blanks and/or tabs. Note that the value of `IGNORECASE' (*note
Case-sensitivity in Matching: Case-sensitivity.) also affects how
fields are split when `FS' is a regular expression.
Each field in the input line may be referenced by its position, `$1',
`$2', and so on. `$0' is the whole line. The value of a field may be
assigned to as well. Field numbers need not be constants:
n = 5
print $n
prints the fifth field in the input line. The variable `NF' is set to
the total number of fields in the input line.
References to nonexistent fields (i.e., fields after `$NF') return
the null-string. However, assigning to a nonexistent field (e.g.,
`$(NF+2) = 5') increases the value of `NF', creates any intervening
fields with the null string as their value, and causes the value of
`$0' to be recomputed, with the fields being separated by the value of
`OFS'.
*Note Reading Input Files: Reading Files, for a full description of
the way `awk' defines and uses fields.
File: gawk.info, Node: Built-in Summary, Next: Arrays Summary, Prev: Fields Summary, Up: Variables/Fields
Built-in Variables
------------------
`awk''s built-in variables are:
`ARGC'
The number of command line arguments (not including options or the
`awk' program itself).
`ARGV'
The array of command line arguments. The array is indexed from 0
to `ARGC' - 1. Dynamically changing the contents of `ARGV' can
control the files used for data.
`CONVFMT'
The conversion format to use when converting numbers to strings.
`FIELDWIDTHS'
A space separated list of numbers describing the fixed-width input
data.
`ENVIRON'
An array containing the values of the environment variables. The
array is indexed by variable name, each element being the value of
that variable. Thus, the environment variable `HOME' would be in
`ENVIRON["HOME"]'. Its value might be `/u/close'.
Changing this array does not affect the environment seen by
programs which `gawk' spawns via redirection or the `system'
function. (This may change in a future version of `gawk'.)
Some operating systems do not have environment variables. The
array `ENVIRON' is empty when running on these systems.
`FILENAME'
The name of the current input file. If no files are specified on
the command line, the value of `FILENAME' is `-'.
`FNR'
The input record number in the current input file.
The input field separator, a blank by default.
`IGNORECASE'
The case-sensitivity flag for regular expression operations. If
`IGNORECASE' has a nonzero value, then pattern matching in rules,
field splitting with `FS', regular expression matching with `~'
and `!~', and the `gsub', `index', `match', `split' and `sub'
predefined functions all ignore case when doing regular expression
operations.
The number of fields in the current input record.
The total number of input records seen so far.
`OFMT'
The output format for numbers for the `print' statement, `"%.6g"'
by default.
`OFS'
The output field separator, a blank by default.
`ORS'
The output record separator, by default a newline.
The input record separator, by default a newline. `RS' is
exceptional in that only the first character of its string value
is used for separating records. If `RS' is set to the null
string, then records are separated by blank lines. When `RS' is
set to the null string, then the newline character always acts as
a field separator, in addition to whatever value `FS' may have.
`RSTART'
The index of the first character matched by `match'; 0 if no match.
`RLENGTH'
The length of the string matched by `match'; -1 if no match.
`SUBSEP'
The string used to separate multiple subscripts in array elements,
by default `"\034"'.
*Note Built-in Variables::, for more information.
File: gawk.info, Node: Arrays Summary, Next: Data Type Summary, Prev: Built-in Summary, Up: Variables/Fields
Arrays
------
Arrays are subscripted with an expression between square brackets
(`[' and `]'). Array subscripts are *always* strings; numbers are
converted to strings as necessary, following the standard conversion
rules (*note Conversion of Strings and Numbers: Conversion.).
If you use multiple expressions separated by commas inside the square
brackets, then the array subscript is a string consisting of the
concatenation of the individual subscript values, converted to strings,
separated by the subscript separator (the value of `SUBSEP').
The special operator `in' may be used in an `if' or `while'
statement to see if an array has an index consisting of a particular
value.
if (val in array)
print array[val]
If the array has multiple subscripts, use `(i, j, ...) in array' to
test for existence of an element.
The `in' construct may also be used in a `for' loop to iterate over
all the elements of an array. *Note Scanning all Elements of an Array:
Scanning an Array.
An element may be deleted from an array using the `delete' statement.
*Note Arrays in `awk': Arrays, for more detailed information.
File: gawk.info, Node: Data Type Summary, Prev: Arrays Summary, Up: Variables/Fields
Data Types
----------
The value of an `awk' expression is always either a number or a
string.
Certain contexts (such as arithmetic operators) require numeric
values. They convert strings to numbers by interpreting the text of
the string as a numeral. If the string does not look like a numeral,
it converts to 0.
Certain contexts (such as concatenation) require string values. They
convert numbers to strings by effectively printing them with `sprintf'.
*Note Conversion of Strings and Numbers: Conversion, for the details.
To force conversion of a string value to a number, simply add 0 to
it. If the value you start with is already a number, this does not
change it.
To force conversion of a numeric value to a string, concatenate it
with the null string.
The `awk' language defines comparisons as being done numerically if
both operands are numeric, or if one is numeric and the other is a
numeric string. Otherwise one or both operands are converted to
strings and a string comparison is performed.
Uninitialized variables have the string value `""' (the null, or
empty, string). In contexts where a number is required, this is
equivalent to 0.
*Note Variables::, for more information on variable naming and
initialization; *note Conversion of Strings and Numbers: Conversion.,
for more information on how variable values are interpreted.
File: gawk.info, Node: Rules Summary, Next: Functions Summary, Prev: Variables/Fields, Up: Gawk Summary
Patterns and Actions
====================
* Menu:
* Pattern Summary:: Quick overview of patterns.
* Regexp Summary:: Quick overview of regular expressions.
* Actions Summary:: Quick overview of actions.
An `awk' program is mostly composed of rules, each consisting of a
pattern followed by an action. The action is enclosed in `{' and `}'.
Either the pattern may be missing, or the action may be missing, but,
of course, not both. If the pattern is missing, the action is executed
for every single line of input. A missing action is equivalent to this
action,
{ print }
which prints the entire line.
Comments begin with the `#' character, and continue until the end of
the line. Blank lines may be used to separate statements. Normally, a
statement ends with a newline, however, this is not the case for lines
ending in a `,', `{', `?', `:', `&&', or `||'. Lines ending in `do' or
`else' also have their statements automatically continued on the
following line. In other cases, a line can be continued by ending it
with a `\', in which case the newline is ignored.
Multiple statements may be put on one line by separating them with a
`;'. This applies to both the statements within the action part of a
rule (the usual case), and to the rule statements.
*Note Comments in `awk' Programs: Comments, for information on
`awk''s commenting convention; *note `awk' Statements versus Lines:
Statements/Lines., for a description of the line continuation mechanism
in `awk'.
File: gawk.info, Node: Pattern Summary, Next: Regexp Summary, Prev: Rules Summary, Up: Rules Summary
Patterns
--------
`awk' patterns may be one of the following:
/REGULAR EXPRESSION/
RELATIONAL EXPRESSION
PATTERN && PATTERN
PATTERN || PATTERN
PATTERN ? PATTERN : PATTERN
(PATTERN)
! PATTERN
PATTERN1, PATTERN2
BEGIN
END
`BEGIN' and `END' are two special kinds of patterns that are not
tested against the input. The action parts of all `BEGIN' rules are
merged as if all the statements had been written in a single `BEGIN'
rule. They are executed before any of the input is read. Similarly,
all the `END' rules are merged, and executed when all the input is
exhausted (or when an `exit' statement is executed). `BEGIN' and `END'
patterns cannot be combined with other patterns in pattern expressions.
`BEGIN' and `END' rules cannot have missing action parts.
For `/REGULAR-EXPRESSION/' patterns, the associated statement is
executed for each input line that matches the regular expression.
Regular expressions are extensions of those in `egrep', and are
summarized below.
A RELATIONAL EXPRESSION may use any of the operators defined below in
the section on actions. These generally test whether certain fields
match certain regular expressions.
The `&&', `||', and `!' operators are logical "and," logical "or,"
and logical "not," respectively, as in C. They do short-circuit
evaluation, also as in C, and are used for combining more primitive
pattern expressions. As in most languages, parentheses may be used to
change the order of evaluation.
The `?:' operator is like the same operator in C. If the first
pattern matches, then the second pattern is matched against the input
record; otherwise, the third is matched. Only one of the second and
third patterns is matched.
The `PATTERN1, PATTERN2' form of a pattern is called a range
pattern. It matches all input lines starting with a line that matches
PATTERN1, and continuing until a line that matches PATTERN2, inclusive.
A range pattern cannot be used as an operand to any of the pattern
operators.
*Note Patterns::, for a full description of the pattern part of `awk'
rules.
File: gawk.info, Node: Regexp Summary, Next: Actions Summary, Prev: Pattern Summary, Up: Rules Summary
Regular Expressions
-------------------
Regular expressions are the extended kind found in `egrep'. They are
composed of characters as follows:
matches the character C (assuming C is a character with no special
meaning in regexps).
matches the literal character C.
matches any character except newline.
matches the beginning of a line or a string.
matches the end of a line or a string.
`[ABC...]'
matches any of the characters ABC... (character class).
`[^ABC...]'
matches any character except ABC... and newline (negated character
class).
`R1|R2'
matches either R1 or R2 (alternation).
`R1R2'
matches R1, and then R2 (concatenation).
matches one or more R's.
matches zero or more R's.
matches zero or one R's.
`(R)'
matches R (grouping).
*Note Regular Expressions as Patterns: Regexp, for a more detailed
explanation of regular expressions.
The escape sequences allowed in string constants are also valid in
regular expressions (*note Constant Expressions: Constants.).
File: gawk.info, Node: Actions Summary, Prev: Regexp Summary, Up: Rules Summary
Actions
-------
Action statements are enclosed in braces, `{' and `}'. Action
statements consist of the usual assignment, conditional, and looping
statements found in most languages. The operators, control statements,
and input/output statements available are patterned after those in C.
* Menu:
* Operator Summary:: `awk' operators.
* Control Flow Summary:: The control statements.
* I/O Summary:: The I/O statements.
* Printf Summary:: A summary of `printf'.
* Special File Summary:: Special file names interpreted internally.
* Numeric Functions Summary:: Built-in numeric functions.
* String Functions Summary:: Built-in string functions.
* Time Functions Summary:: Built-in time functions.
* String Constants Summary:: Escape sequences in strings.
File: gawk.info, Node: Operator Summary, Next: Control Flow Summary, Prev: Actions Summary, Up: Actions Summary
Operators
.........
The operators in `awk', in order of increasing precedence, are:
`= += -= *= /= %= ^='
Assignment. Both absolute assignment (`VAR=VALUE') and operator
assignment (the other forms) are supported.
A conditional expression, as in C. This has the form `EXPR1 ?
EXPR2 : EXPR3'. If EXPR1 is true, the value of the expression is
EXPR2; otherwise it is EXPR3. Only one of EXPR2 and EXPR3 is
evaluated.
Logical "or".
Logical "and".
`~ !~'
Regular expression match, negated match.
`< <= > >= != =='
The usual relational operators.
`BLANK'
String concatenation.
`+ -'
Addition and subtraction.
`* / %'
Multiplication, division, and modulus.
`+ - !'
Unary plus, unary minus, and logical negation.
Exponentiation (`**' may also be used, and `**=' for the assignment
operator, but they are not specified in the POSIX standard).
`++ --'
Increment and decrement, both prefix and postfix.
Field reference.
*Note Expressions as Action Statements: Expressions, for a full
description of all the operators listed above. *Note Examining Fields:
Fields, for a description of the field reference operator.
File: gawk.info, Node: Control Flow Summary, Next: I/O Summary, Prev: Operator Summary, Up: Actions Summary
Control Statements
..................
The control statements are as follows:
if (CONDITION) STATEMENT [ else STATEMENT ]
while (CONDITION) STATEMENT
do STATEMENT while (CONDITION)
for (EXPR1; EXPR2; EXPR3) STATEMENT
for (VAR in ARRAY) STATEMENT
break
continue
delete ARRAY[INDEX]
exit [ EXPRESSION ]
{ STATEMENTS }
*Note Control Statements in Actions: Statements, for a full
description of all the control statements listed above.
File: gawk.info, Node: I/O Summary, Next: Printf Summary, Prev: Control Flow Summary, Up: Actions Summary
I/O Statements
..............
The input/output statements are as follows:
`getline'
Set `$0' from next input record; set `NF', `NR', `FNR'.
`getline <FILE'
Set `$0' from next record of FILE; set `NF'.
`getline VAR'
Set VAR from next input record; set `NF', `FNR'.
`getline VAR <FILE'
Set VAR from next record of FILE.
`next'
Stop processing the current input record. The next input record
is read and processing starts over with the first pattern in the
`awk' program. If the end of the input data is reached, the `END'
rule(s), if any, are executed.
`next file'
Stop processing the current input file. The next input record
read comes from the next input file. `FILENAME' is updated, `FNR'
is set to 1, and processing starts over with the first pattern in
the `awk' program. If the end of the input data is reached, the
`END' rule(s), if any, are executed.
`print'
Prints the current record.
`print EXPR-LIST'
Prints expressions.
`print EXPR-LIST > FILE'
Prints expressions on FILE.
`printf FMT, EXPR-LIST'
Format and print.
`printf FMT, EXPR-LIST > file'
Format and print on FILE.
Other input/output redirections are also allowed. For `print' and
`printf', `>> FILE' appends output to the FILE, and `| COMMAND' writes
on a pipe. In a similar fashion, `COMMAND | getline' pipes input into
`getline'. `getline' returns 0 on end of file, and -1 on an error.
*Note Explicit Input with `getline': Getline, for a full description
of the `getline' statement. *Note Printing Output: Printing, for a full
description of `print' and `printf'. Finally, *note The `next'
Statement: Next Statement., for a description of how the `next'
statement works.
File: gawk.info, Node: Printf Summary, Next: Special File Summary, Prev: I/O Summary, Up: Actions Summary
`printf' Summary
................
The `awk' `printf' statement and `sprintf' function accept the
following conversion specification formats:
An ASCII character. If the argument used for `%c' is numeric, it
is treated as a character and printed. Otherwise, the argument is
assumed to be a string, and the only first character of that
string is printed.
A decimal number (the integer part).
A floating point number of the form `[-]d.ddddddE[+-]dd'.
A floating point number of the form [`-']`ddd.dddddd'.
Use `%e' or `%f' conversion, whichever produces a shorter string,
with nonsignificant zeros suppressed.
An unsigned octal number (again, an integer).
A character string.
An unsigned hexadecimal number (an integer).
Like `%x', except use `A' through `F' instead of `a' through `f'
for decimal 10 through 15.
A single `%' character; no argument is converted.
There are optional, additional parameters that may lie between the
`%' and the control letter:
The expression should be left-justified within its field.
`WIDTH'
The field should be padded to this width. If WIDTH has a leading
zero, then the field is padded with zeros. Otherwise it is padded
with blanks.
`.PREC'
A number indicating the maximum width of strings or digits to the
right of the decimal point.
Either or both of the WIDTH and PREC values may be specified as `*'.
In that case, the particular value is taken from the argument list.
*Note Using `printf' Statements for Fancier Printing: Printf, for
examples and for a more detailed description.
File: gawk.info, Node: Special File Summary, Next: Numeric Functions Summary, Prev: Printf Summary, Up: Actions Summary
Special File Names
..................
When doing I/O redirection from either `print' or `printf' into a
file, or via `getline' from a file, `gawk' recognizes certain special
file names internally. These file names allow access to open file
descriptors inherited from `gawk''s parent process (usually the shell).
The file names are:
`/dev/stdin'
The standard input.
`/dev/stdout'
The standard output.
`/dev/stderr'
The standard error output.
`/dev/fd/N'
The file denoted by the open file descriptor N.
These file names may also be used on the command line to name data
files.
*Note Standard I/O Streams: Special Files, for a longer description
that provides the motivation for this feature.